Virtualized data storage system cache management

ABSTRACT

Virtual storage arrays consolidate branch data storage at data centers connected via wide area networks. Virtual storage arrays appear to storage clients as local data storage; however, virtual storage arrays actually store data at the data center. The virtual storage arrays overcomes bandwidth and latency limitations of the wide area network by predicting and prefetching storage blocks, which are then cached at the branch location. Virtual storage arrays leverage an understanding of the semantics and structure of high-level data structures associated with storage blocks to predict which storage blocks are likely to be requested by a storage client in the near future. Virtual storage arrays determine the association between requested storage blocks and corresponding high-level data structure entities to predict additional high-level data structure entities that are likely to be accessed. From this, the virtual storage array identifies the additional storage blocks for prefetching.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is related to U.S. Provisional Patent Application No.61/162,463, entitled “Virtualized Data Storage Over Wide-Area Networks”,filed Mar. 23, 2009; U.S. patent application Ser. No. ______ [AttorneyDocket Number R001110US], entitled “Virtualized Data Storage OverWide-Area Networks”, filed ______; U.S. patent application Ser. No.______ [Attorney Docket Number R001410US], entitled “Virtualized DataStorage System Architecture”, filed ______; and U.S. patent applicationSer. No. ______ [Attorney Docket Number R001420US], entitled “VirtualData Storage System Optimizations”, filed ______; all of which areincorporated by reference herein for all purposes.

BACKGROUND

The present invention relates generally to data storage systems, andsystems and methods to improve storage efficiency, compactness,performance, reliability, and compatibility. Enterprises often spangeographical locations, including multiple corporate sites, branchoffices, and data centers, all of which are generally connected over awide-area network (WAN). Although in many cases, servers are run in adata center and accessed over the network, there are also cases in whichservers need to be run in distributed locations at the “edges” of thenetwork. These network edge locations are generally referred to asbranch locations in this application, regardless of the purposes ofthese locations. The need to operate servers at branch locations mayarise from variety of reasons, including efficiently handling largeamounts of newly written data and ensuring service availability duringWAN outages.

The need to run servers at branch locations in a network, as opposed toa centralized data center location, leads to a corresponding requirementfor data storage for those servers at the branch locations, both tostore the operating system data for branch servers, in some cases, foruser or application data. The branch data storage requires maintenanceand administration, including proper sizing for future growth, datasnapshots, archives, and backups, and replacements and/or upgrades ofstorage hardware and software when the storage hardware or softwarefails or branch data storage requirements change.

Although the maintenance and administration of data storage in generalincurs additional costs, branch data storage is more expensive andinefficient than consolidated data storage at a centralized data center.Organizations often require on-site personnel at each branch location toconfigure and upgrade each branch's data storage, and to manage databackups and data retention. Additionally, organizations often purchaseexcess storage capacity for each branch location to allow for upgradesand growing data storage requirements. Because branch locations areserviced infrequently, due to their numbers and geographic dispersion,organizations often deploy enough data storage at each branch locationto allow for months or years of storage growth. However, this excessstorage capacity often sits unused for months or years until it isneeded, unnecessarily driving up costs.

Although the consolidation of information technology infrastructuredecreases costs and improves management efficiency, branch data storageis rarely consolidated at a network branch location, because theintervening WAN is slow and has high latency, making storage accessesunacceptably slow for branch client systems and application servers.Thus, organizations have previously been unable to consolidate datastorage from multiple branches.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention will be described with reference to the drawings, inwhich:

FIG. 1 illustrates a virtualized data storage system architectureaccording to an embodiment of the invention;

FIGS. 2A-2B illustrate methods of prefetching storage blocks to improvevirtualized data storage system performance according to embodiments ofthe invention;

FIG. 3 illustrates a method of processing storage block write requeststo improve virtualized data storage system performance according to anembodiment of the invention;

FIGS. 4A-4C illustrate write order preservation policies according toembodiments of the invention;

FIG. 5 illustrates an arrangement for recursively applyingtransformations and optimizations to improve virtualized data storagesystem performance according to an embodiment of the invention;

FIG. 6 illustrates a method of creating a data storage snapshot in avirtualized data storage system performance according to an embodimentof the invention; and

FIG. 7 illustrates an example computer system capable of a virtualizeddata storage system device according to an embodiment of the invention.

SUMMARY

An embodiment of the invention uses virtual storage arrays toconsolidate branch location-specific data storage at data centersconnected with branch locations via wide area networks. The virtualstorage array appears to a storage client as a local branch datastorage; however, embodiments of the invention actually store thevirtual storage array data at a data center connected with the branchlocation via a wide-area network. In embodiments of the invention, abranch storage client accesses the virtual storage array using storageblock based protocols.

Embodiments of the invention overcome the bandwidth and latencylimitations of the wide area network between branch locations and thedata center by predicting storage blocks likely to be requested in thefuture by the branch storage client and prefetching and caching thesepredicted storage blocks at the branch location. When this prediction issuccessful, storage block requests from the branch storage client may befulfilled in whole or in part from the branch location' storage blockcache. As a result, the latency and bandwidth restrictions of thewide-area network are hidden from the storage client.

The branch location storage client uses storage block-based protocols tospecify reads, writes, modifications, and/or deletions of storageblocks. However, servers and higher-level applications typically accessdata in terms of files in a structured file system, relational database,or other high-level data structure. Each entity in the high-level datastructure, such as a file or directory, or database table, node, or row,may be spread out over multiple storage blocks at various non-contiguouslocations in the storage device. Thus, prefetching storage blocks basedsolely on their locations in the storage device is unlikely to beeffective in hiding wide-area network latency and bandwidth limits fromstorage clients.

An embodiment of the invention leverages an understanding of thesemantics and structure of the high-level data structures associatedwith the storage blocks to predict which storage blocks are likely to berequested by a storage client in the near future. To do this, anembodiment of the invention determines the association between requestedstorage blocks and the corresponding high-level data structure entities,such as files, directories, or database elements. Once this embodimenthas identified one or more of the high-level data structure entitiesassociated with a requested storage block, this embodiment of theinvention identifies additional portions of the same or other high-leveldata structure entities that are likely to be accessed by the storageclient. This embodiment of the invention then identifies the additionalstorage blocks corresponding to these additional high-level datastructure entities. The additional storage blocks are then prefetchedand cached at the branch location.

Another embodiment of the invention analyzes a selected high-level datastructure entity to identify portions of the same or other high-leveldata structure entities that is likely to be accessed by the storageclient. This embodiment of the invention then identifies the additionalstorage blocks corresponding to these additional high-level datastructure entities. The additional storage blocks are then prefetchedand cached at the branch location. This embodiment of the invention mayalso identify additional high-level data structure entities to analyzebased on its analysis of previously selected high-level data structureentities.

Further embodiments of the invention may identify correspondinghigh-level data structure entities directly from requests for storageblocks. Additionally, embodiments of the invention may successivelyapply any number of successive transformations to storage block requeststo identify associated high-level data structure entities. Thesesuccessive transformations may include transformations to intermediatelevel data structure entities. Intermediate and high-level datastructure entities may include virtual machine data structures, such asvirtual machine file system files, virtual machine file system storageblocks, virtual machine storage structures, and virtual machine diskimages.

DETAILED DESCRIPTION OF ILLUSTRATIVE EMBODIMENTS

FIG. 1 illustrates a virtualized data storage system architecture 100according to an embodiment of the invention. Virtualized data storagesystem architecture 100 includes a data center 101 connected with atleast one branch network location 102 via a wide-area network (WAN) 130.Each branch location 102 includes at least one storage client 139, suchas a file server, application server, database server, or storage areanetwork (SAN) interface. A storage client 139 may be connected with alocal-area network (LAN) 151, including routers, switches, and otherwired or wireless network devices, for connecting with server and clientsystems and other devices 152.

Previously, typical branch location installations also required a localphysical data storage device for the storage client. For example, aprior typical branch location LAN installation may include a file serverfor storing data for the client systems and application servers, such asdatabase servers and e-mail servers. In prior systems, this branchlocation's data storage is located at the branch location site andconnected directly with the branch location LAN or SAN. The branchlocation physical data storage device previously could not be located atthe data center 101, because the intervening WAN 130 is too slow and hashigh latency, making storage accesses unacceptably slow for storageclients.

An embodiment of the invention allows for storage consolidation ofbranch location-specific data storage at data centers connected withbranch locations via wide area networks. This embodiment of theinvention overcomes the bandwidth and latency limitations of the widearea network between branch locations and the data center. To this end,an embodiment of the invention includes virtual storage arrays.

In an embodiment, the branch location 102 includes a virtual storagearray interface device 135. The virtual storage array interface device135 presents a virtual storage array 137 to branch location users, suchas the branch location storage client 139. A virtual storage array 137can be used for the same purposes as a local storage area network orother data storage device. For example, a virtual storage array 137 maybe used in conjunction with a file server for general-purpose datastorage, in conjunction with a database server for database applicationstorage, or in conjunction with an e-mail server for e-mail storage.However, the virtual storage array 137 stores its data at a data center101 connected with the branch location 102 via a wide area network 130.Multiple separate virtual storage arrays, from different branchlocations, may store their data in the same data center and, asdescribed below, on the same physical storage devices.

Because the data storage of multiple branch locations is consolidated ata data center, the efficiency, reliability, cost-effectiveness, andperformance of data storage is improved. An organization can manage andcontrol access to their data storage at a central data center, ratherthan at large numbers of separate branch locations. This increases thereliability and performance of an organization's data storage. This alsoreduces the personnel required at branch location offices to provision,maintain, and backup data storage. It also enables organizations toimplement more effective backup systems, data snapshots, and disasterrecovery for their data storage. Furthermore, organizations can plan forstorage growth more efficiently, by consolidating their storageexpansion for multiple branch locations and reducing the amount ofexcess unused storage. Additionally, an organization can applyoptimizations such as compression or data deduplication over the datafrom multiple branch locations stored at the data center, reducing thetotal amount of storage required by the organization.

In an embodiment, virtual storage array interface 135 may be astand-alone computer system or network appliance or built into othercomputer systems or network equipment as hardware and/or software. In afurther embodiment, a branch location virtual storage array interface135 may be implemented as a software application or other executablecode running on a client system or application server.

In an embodiment, a branch location virtual storage array interface 135includes one or more storage array network interfaces and supports oneor more storage block network protocols to connect with one or morestorage clients 139 via a local storage area network (SAN) 138. Examplesof storage array network interfaces suitable for use with embodiments ofthe invention include Ethernet, Fibre Channel, IP, and InfiniBandinterfaces. Examples of storage array network protocols include ATA,Fibre Channel Protocol, and SCSI. Various combinations of storage arraynetwork interfaces and protocols are suitable for use with embodimentsof the invention, including iSCSI, HyperSCSI, Fibre Channel overEthernet, and iFCP. In cases where the storage array network interfaceuses Ethernet, an embodiment of the branch location virtual storagearray interface can use the branch location LAN's physical connectionsand networking equipment for communicating with client systems andapplication services. In other embodiments, separate connections andnetworking equipment, such as Fibre Channel networking equipment, isused to connect the branch location virtual storage array interface withclient systems and/or application services.

It should be noted that the branch location virtual storage arrayinterface 135 allows storage clients to access data in the virtualstorage array via storage block protocols, unlike file servers thatutilize file-based protocols. Thus, the virtual storage array 137 may beaccessed by any type of storage client in the same manner as a localphysical storage device or storage array. Furthermore, applicationsexecuted by the storage client 139 or other client and server systems152 may access the virtual storage array in the same manner as a localphysical storage device or storage array.

In an embodiment, the storage client 139 is included in a file serverthat also provide a network file interface to the virtual storage array137 to client systems and other application servers. In a furtherembodiment, the branch location virtual storage array interface 135 isintegrated as hardware and/or software with an application server, suchas a file server, database server, or e-mail server. In this embodiment,the branch location virtual storage array interface 135 can includeapplication server interfaces, such as a network file interface, forinterfacing with other application servers and/or client systems.

A branch location virtual storage array interface 135 presents a virtualstorage array 137 to one or more storage clients 139. To the storageclient 139, the virtual storage array 137 appears to be a local storagearray, having its physical data storage at the branch location 102.However, the branch location virtual storage array interface 135actually stores and retrieves data from physical data storage deviceslocated at the data center 101. Because virtual storage array dataaccesses must travel via the WAN 130 between the data center 101 LAN toa branch location 102 LAN, the virtual storage array 137 is subject tothe latency and bandwidth restrictions of the WAN 130.

In an embodiment, the branch location virtual storage array interface135 includes a virtual storage array cache 145, which is used toameliorate the effects of the WAN 130 on virtual storage array 137performance. In an embodiment, the virtual storage array cache 145includes a storage block read cache 147 and a storage block write cache149.

The storage block read cache 147 is adapted to store local copies ofstorage blocks requested by storage client 139. As described in detailbelow, the virtualized data storage system architecture 100 may attemptto predict which storage blocks will be requested by the storage client139 in the future and preemptively send these predicted storage blocksfrom the data center 101 to the branch 102 via WAN 130 for storage inthe storage block read cache 147. If this prediction is partially orwholly correct, then when the storage client 139 eventually requests oneor more of these prefetched storage blocks from the virtual storagearray 137, an embodiment of the virtual storage array interface 135 canfulfill this request using local copies of the requested storage blocksfrom the block read cache 145. By fulfilling access requests usingprefetched local copies of storage blocks from the block read cache 145,the latency and bandwidth restrictions of WAN 130 are hidden from thestorage client 139. Thus, from the perspective of the storage client139, the virtual storage array 137 appears to perform storage block readoperations as if the physical data storage were located at the branchlocation 102.

Similarly, the storage block write cache 149 is adapted to store localcopies of new or updated storage blocks written by the storage client139. As described in detail below, the storage block write cache 149temporarily stores new or updated storage blocks written by the storageclient 139 until these storage blocks are copied back to physical datastorage at the data center 101 via WAN 130. By temporarily storing newand updated storage blocks locally at the branch location 102, thebandwidth and latency of the WAN 130 is hidden from the storage client139. Thus, from the perspective of the storage client 139, the virtualstorage array 137 appears to perform storage block write operations asif the physical data storage were located at the branch location 102.

In an embodiment, the virtual storage array cache 145 includesnon-volatile and/or redundant data storage, so that data in new orupdated storage blocks are protected from system failures until they canbe transferred over the WAN 130 and stored in physical data storage atthe data center 101.

In an embodiment, the branch location virtual storage array interface135 operates in conjunction with a data center virtual storage arrayinterface 107. The data center virtual storage array interface 107 islocated on the data center 101 LAN and may communicate with one or morebranch location virtual storage array interfaces via the data center 101LAN, the WAN 130, and their respective branch location LANs. Datacommunications between virtual storage array interfaces can be in anyform and/or protocol used for carrying data over wired and wireless datacommunications networks, including TCP/IP.

In an embodiment, data center virtual storage array interface 107 isconnected with one or more physical data storage devices 103 to storeand retrieve data for one or more virtual storage arrays, such asvirtual storage array 137. To this end, an embodiment of a data centervirtual storage array interface 107 accesses a physical storage arraynetwork interface, which in turn accesses physical data storage array103 a on a storage array network (SAN) 105. In another embodiment, thedata center virtual storage array interface 107 includes one or morestorage array network interfaces and supports one or more storage arraynetwork protocols for directly connecting with a physical storage arraynetwork 105 and its physical data storage array 103 a. Examples ofstorage array network interfaces suitable for use with embodiments ofthe invention include Ethernet, Fibre Channel, IP, and InfiniBandinterfaces. Examples of storage array network protocols include ATA,Fibre Channel Protocol, and SCSI. Various combinations of storage arraynetwork interfaces and protocols are suitable for use with embodimentsof the invention, including iSCSI, HyperSCSI, Fibre Channel overEthernet, and iFCP. Embodiments of the data center virtual storage arrayinterface 107 may connect with the physical storage array interfaceand/or directly with the physical storage array network 105 using theEthernet network of the data center LAN and/or separate datacommunications connections, such as a Fibre Channel network.

In another embodiment, data center virtual storage array interface 107may store and retrieve data for one or more virtual storage arrays, suchas virtual storage array 137, using a network storage device, such asfile server 103 b. File server 103 b may be connected with data centervirtual storage array 137 via local-area network (LAN) 115, such as anEthernet network, and communicate using a network file system protocol,such as NFS, SMB, or CIFS.

Embodiments of the data center virtual storage array interface 107 mayutilize a number of different arrangements to store and retrieve virtualstorage array data with physical data storage array 103 a or file server103 b. In one embodiment, the virtual data storage array 137 presents avirtualized logical storage unit, such as an iSCSI or FibreChannellogical unit number (LUN), to storage client 139. This virtual logicalstorage unit is mapped to a corresponding logical storage unit 104 a onphysical data storage array 103 a. Data center virtual storage arrayinterface 107 stores and retrieves data for this virtualized logicalstorage unit using a non-virtual logical storage unit 104 a provided byphysical data storage array 103 a. In a further embodiment, the datacenter virtual data storage array interface 107 supports multiple branchlocations and maps each storage client's virtualized logical storageunit to a different non-virtual logical storage unit provided byphysical data storage array 103 a.

In another embodiment, virtual data storage array interface 107 maps avirtualized logical storage unit to a virtual machine file system 104 b,which is provided by the physical data storage array 103 a. Virtualmachine file system 104 b is adapted to store one or more virtualmachine disk images 113, each representing the configuration andoptionally state and data of a virtual machine. Each of the virtualmachine disk images 113, such as virtual machine disk images 113 a and113 b, includes one or more virtual machine file systems to storeapplications and data of a virtual machine. To a virtual machineapplication, its virtual machine disk image 113 within the virtualmachine file system 104 b appears as a logical storage unit. However,the complete virtual machine file system 104 b appears to the datacenter virtual storage array interface 107 as a single logical storageunit.

In another embodiment, virtual data storage array interface 107 maps avirtualized logical storage unit to a logical storage unit or filesystem 104 c provided by the file server 103 c.

As described above, storage clients can interact with virtual storagearrays in the same manner that they would interact with physical storagearrays. This includes issuing storage commands to the branch locationvirtual storage interface using storage array network protocols such asiSCSI or Fibre Channel protocol. Most storage array network protocolsorganize data according to storage blocks, each of which has a uniquestorage address or location. A storage block's unique storage addressmay include logical unit number (using the SCSI protocol) or otherrepresentation of a logical volume.

In an embodiment, the virtual storage array provided by a branchlocation virtual storage interface allows a storage client to accessstorage blocks by their unique storage address within the virtualstorage array. However, because one or more virtual storage arraysactually store their data within one or more of the physical datastorage devices 103, an embodiment of the invention allows arbitrarymappings between the unique storage addresses of storage blocks in thevirtual storage array and the corresponding unique storage addresses inone or more physical data storage devices 103. In an embodiment, themapping between virtual and physical storage address may be performed bya branch location virtual storage array interface 137 and/or by datacenter virtual storage array interface 107. Furthermore, there may bemultiple levels of mapping between the addresses of storage blocks inthe virtual storage array and their corresponding addresses in thephysical storage device.

In an embodiment, storage blocks in the virtual storage array may be ofa different size and/or structure than the corresponding storage blocksin a physical storage array or data storage device. For example, if datacompression is applied to the storage data, then the physical storagearray data blocks may be smaller than the storage blocks of the virtualstorage array to take advantage of data storage savings. In anembodiment, the branch location and/or data center virtual storage arrayinterfaces map one or more virtual storage array storage blocks to oneor more physical storage array storage blocks. Thus, a virtual storagearray storage block can correspond with a fraction of a physical storagearray storage block, a single physical storage array storage block, ormultiple physical storage array storage blocks, as required by theconfiguration of the virtual and physical storage arrays.

In a further embodiment, the branch location and data center virtualstorage array interfaces may reorder or regroup storage operations fromstorage clients to improve efficiency of data optimizations such as datacompression. For example, if two storage clients are simultaneouslyaccessing the same virtual storage array, then these storage operationswill be intermixed when received by the branch location virtual storagearray interface. An embodiment of the branch location and/or data centervirtual storage array interface can reorder or regroup these storageoperations according to storage client, type of storage operation, dataor application type, or any other attribute or criteria to improvevirtual storage array performance and efficiency. For example, a virtualstorage array interface can group storage operations by storage clientand apply data compression to each storage client's operationsseparately, which is likely to provide greater data compression thancompressing all storage operations together.

As described above, an embodiment of the virtualized data storage systemarchitecture 100 attempts to predict which storage blocks will berequested by a storage client in the near future, prefetches thesestorage blocks from the physical data storage devices 103, and forwardsthese to the branch location 102 for storage in the storage block readcache 147. When this prediction is successful and storage block requestsmay be fulfilled in whole or in part from the block read cache 147, thelatency and bandwidth restrictions of the WAN 130 are hidden from thestorage client. An embodiment of the virtualized data storage systemarchitecture 100 includes a storage block access optimizer 120 to selectstorage blocks for prefetching to storage clients. In an embodiment, thestorage block access optimizer 120 is located at the data center 101 andis connected or incorporated into the data center virtual data storagearray interface 107. In an alternate embodiment, the storage blockaccess optimizer 120 may be located at the branch location 102 and beconnected with or incorporated into the branch location virtual datastorage interface 135.

As discussed above, storage devices such as physical data storage arraysand the virtual data storage array are accessed using storageblock-based protocols. A storage block is a sequence of bytes or bits ofdata. Data storage devices represent their data storage as a set ofstorage blocks that may be used to store and retrieve data. The set ofstorage blocks is an abstraction of the underlying hardware of aphysical or virtual data storage device. Storage clients use storageblock-based protocols to specify reads, writes, modifications, and/ordeletions of storage blocks. However, servers and higher-levelapplications typically access data in terms of files in a structuredfile system, relational database, or other high-level data structure.Each entity in the high-level data structure, such as a file ordirectory, or database table, node, or row, may be spread out overmultiple storage blocks at various non-contiguous locations in thestorage device. Thus, prefetching storage blocks based solely on theirlocation in the storage device is unlikely to be effective in hiding WANlatency and bandwidth limits from storage clients.

In an embodiment, the storage block access optimizer 120 leverages anunderstanding of the semantics and structure of the high-level datastructures associated with the storage blocks to predict which storageblocks are likely to be requested by a storage client in the nearfuture. To do this, the storage block access optimizer 120 must be ableto determine the association between storage blocks and its high-leveldata structure. An embodiment of the storage block access optimizer 120uses an inferred storage structure database (ISSD) 123 to match storageblocks with their associated entity in the high-level data structure.For example, given a specific storage block location, the storage blockaccess optimizer 120 may use the ISSD 123 to identify the file ordirectory in a file system, or the database table, record, or node, thatis using this storage block to store some or all of its data.

Once the storage block access optimizer 120 has identified thehigh-level data structure entity associated with a storage block, thestorage block access optimizer 120 may employ a number of differenttechniques to predict which additional storage blocks are likely to berequested by a storage client. For example, storage block accessoptimizer 120 may observe requests from a storage client 139 for storageblocks from the virtual data storage array 137, identify the high-leveldata structure entities associated with the requested storage blocks,and select additional storage blocks associated with these or otherhigh-level data structure entities for prefetching. These types ofstorage block prefetching techniques are referred to as reactiveprefetching. In another example, the storage block access optimizer 120may analyze entities in the high-level data structures, such as files,directories, or database entities, to identify specific entities orportions thereof that are likely to be requested by the storage client139. Using the ISSD 123, the storage block access optimizer 120identifies storage blocks corresponding with these identified entitiesor portions thereof and prefetches these storage blocks for storage inthe block read cache 147 at the branch location 102. These types ofstorage block prefetching techniques are referred to as policy-basedprefetching. Further examples of reactive and policy-based prefetchingare discussed below. Embodiments of the storage block access optimizer120 may utilize any combination of reactive and policy-based prefetchingtechniques to select storage blocks to be prefetched and stored in theblock read cache 147 at the branch location 102.

In a further embodiment, the branch location 102 and data centerlocation 101 may optionally include network optimizers 125 for improvingthe performance of data communications over the WAN between branchesand/or the data center. Network optimizers 125 can improve actual andperceived WAN network performance using techniques including compressingdata communications; anticipating and prefetching data; cachingfrequently accessed data; shaping and restricting network traffic; andoptimizing usage of network protocols. In an embodiment, networkoptimizers 125 may be used in conjunction with virtual data storagearray interfaces 107 and 135 to further improve virtual storage array137 performance for storage blocks accessed via the WAN 130. In otherembodiments, network optimizers 125 may ignore or pass-through virtualstorage array 137 data traffic, relying on the virtual storage arrayinterfaces 107 and 135 at the data center 101 and branch location 102 tooptimize WAN performance.

Further embodiments of the invention may be used in different networkarchitectures. For example, a data center virtual storage arrayinterface 107 may be connected directly between WAN 130 and a physicaldata storage array 103, eliminating the need for a data center LAN.Similarly, a branch location virtual storage array interface 135,implemented for example in the form of a software application executedby a storage client computer system, may be connected directly with WAN130, such as the internet, eliminating the need for a branch locationLAN. In another example, the data center and branch location virtualdata storage array interfaces 107 and 135 may be combined into a singleunit, which may be located at the branch location 102.

FIGS. 2A-2B illustrate methods of prefetching storage blocks to improvevirtualized data storage system performance according to embodiments ofthe invention. FIG. 2A illustrates a method 200 of performing reactiveprefetching of storage blocks according to an embodiment of theinvention. Step 205 receives a storage block read request from a storageclient at the branch location. In an embodiment, the storage block readrequest may be received by a branch location virtual data storage arrayinterface.

In response to the receipt of the storage block read request in step205, decision block 210 determines if the requested storage block hasbeen previously retrieved and stored in the storage block read cache atthe branch location. If so, step 220 retrieves the requested storageblock from the storage block read cache and returns it to the requestingstorage client. In an embodiment, if the system includes a data centervirtual storage array interface, then step 220 also forwards the storageblock read request back to the data center virtual storage arrayinterface for use in identifying additional storage blocks likely to berequested by the storage client in the future.

If the storage block read cache at the branch location does not includethe requested storage block, step 215 retrieves the requested storageblock via a WAN connection from the virtual storage array data locatedin a physical data storage at the data center. In an embodiment, abranch location virtual storage array interface forwards the storageblock read request to the data center virtual storage array interfacevia the WAN connection. The data center virtual storage array interfacethen retrieves the requested storage block from the physical storagearray and returns it to the branch location virtual storage arrayinterface, which in turn provides this requested storage block to thestorage client. In a further embodiment of step 215, a copy of theretrieved storage block may be stored in the storage block read cachefor future accesses.

During and/or following the retrieval of the requested storage blockfrom the virtual storage array or virtual storage array cache, steps 225to 250 prefetch additional storage blocks likely to be requested by thestorage client in the near future. Step 225 identifies the high-leveldata structure entity associated with the requested storage block.Typical block storage protocols, such as iSCSI and FCP, specify blockread requests using a storage block address or identifier. However,these storage block read requests do not include any identification ofthe high-level data structure, such as a file, directory, or databaseentity, that is associated with this storage block. Therefore, anembodiment of step 225 accesses an ISSD to identify the high-level datastructure associated with the requested storage block.

In an embodiment, step 225 provides the ISSD with the storage blockaddress or identifier. In response, the ISSD returns an identifier ofthe high-level data structure entity associated with the requestedstorage block. The identifier of the high-level data structure entitymay be an inode or similar file system identifier or a database storagestructure identifier, such as a database table or B-tree node. In afurther embodiment, the ISSD also includes a location within thehigh-level data structure entity corresponding with the requestedstorage block. For example, step 225 may provide a storage blockidentifier to the ISSD and in response receive the inode or other filesystem identifier for a file stored in this storage block. Additionally,the ISSD can return an offset, index, or other file location indicatorthat specifies the portion of this file stored in the storage block.

Using the identification of the high-level data structure entity andoptionally the location provided by the ISSD, step 230 identifiesadditional high-level data structure entities or portions thereof thatare likely to be requested by the storage client. There are a number ofdifferent techniques for identifying addition high-level data structureentities or portions thereof for prefetching that may be used byembodiments of step 230. Some of these are described in detail inco-pending U.S. patent application Ser. No. ______ [Attorney DocketNumber R001420US], entitled “Virtual Data Storage System Optimizations”,filed ______, which is incorporated by reference herein for allpurposes.

One example technique is to prefetch portions of the high-level datastructure entity based on their adjacency or close proximity to theidentified portion of the entity. For example, if step 225 determinesthat the requested storage block corresponds with a portion of a filefrom file offset 0 up to offset 4095, then step 230 may identify asecond portion of this same file beginning with offset 4096 forprefetching. It should be noted that although these two portions areadjacent in the high-level data structure entity, their correspondingstorage blocks may be non-contiguous.

Another example technique is to identify the type of high-level datastructure entity, such as a file of a specific format, a directory in afile system, or a database table, and apply one or more heuristics toidentify additional portions of this high-level data structure entity ora related high-level data structure entity for prefetching. For example,applications employing a specific type of file may frequently accessdata at a specific location within these files, such as at the beginningor end of the file. Using knowledge of this application orentity-specific behavior, step 230 may identify these frequentlyaccessed portions of the file for prefetching.

Yet another example technique monitors the times at which high-leveldata structure entities are accessed. High-level data structure entitiesthat are accessed at approximately the same time are associated togetherby the virtual storage array architecture. If any one of theseassociated high-level data structure entities is later accessed again,an embodiment of step 230 identifies one or more associated high-leveldata structure entities that were previously accessed at approximatelythe same time as the requested high-level data structure entity forprefetching. For example, a storage client may have previously requestedstorage blocks from files A, B, and C at approximately the same time,such as within a minute of each other. Based on this previous accesspattern, if step 225 determines that a requested storage block isassociated with file A, step 230 may identify all or portions of files Band C for prefetching.

In still another example technique, step 230 analyzes the high-leveldata structure entity associated with the requested storage block toidentify related portions of the same or other high-level data structureentity for prefetching. For example, application files may includereferences to additional files, such as overlay files or dynamicallyloaded libraries. Similarly, a database table may include references toother database tables. Once step 225 identifies the high-level datastructure entity associated with a requested storage block, step 230 mayuse an analysis of this high-level data structure entity to identifyadditional referenced high-level data structure entities. The referencedhigh-level data structure entities may be prefetched. In an embodiment,the analysis of high-level data structure entities for references toother high-level data structure entities may be performed asynchronouslywith method 200.

Step 230 identifies all or portions of one or more high-level datastructure entities for prefetching based on the high-level datastructure entity associated with the requested storage block. However,as discussed above, storage clients specify data access requests interms of storage blocks, not high-level data structure entities such asfiles, directories, or database tables. Thus, step 235 identifies one ormore storage blocks corresponding with the high-level data structureentities identified for prefetching in step 230. In an embodiment, step235 provides the ISSD with identifiers for one or more high-level datastructure entities, such as the inodes of files or similar identifiersfor other types of file systems or database storage structures.Optionally, step 235 also provides an offset, file location, or othertype of address identify a specific portion of a high-level datastructure entity to be prefetched. In response, the ISSD returns anidentifier of one or more storage blocks associated with the high-leveldata structure entities. These identified storage blocks are used tostore the high-level data structure entities or portions thereof.

Decision block 240 determines if the storage blocks identified in step235 have already been stored in the storage block read cache located atthe branch location. In an embodiment, the storage block accessoptimizer at the data center maintains a record of all of the storageblocks that have copies stored in the storage block read cache. In analternate embodiment, the storage block access optimizer queries thebranch location virtual storage array interface to determine if copiesof these identified storage blocks have already been stored in thestorage block read cache.

In still a further embodiment, decision block 240 and the determinationof whether an additional storage block has been previously retrieved andcached may be omitted. Instead, this embodiment can send all of theadditional storage blocks identified by step 235 to the branch locationvirtual storage array interface to be cached. This embodiment can beused when WAN latency, rather than WAN bandwidth limitations, are anoverriding concern.

If all of the identified storage blocks from step 235 are already storedin the storage block read cache, then method 200 proceeds from decisionblock 240 back to step 205 to await receipt of further storage blockrequests.

If some or all of the storage blocks identified in step 235 are notalready stored in the storage block read cache, then step 245 retrievesthese uncached storage blocks from the virtual storage array datalocated in a physical data storage on the data center LAN. The retrievedstorage blocks are sent via the WAN connection from the data centerlocation to the branch location. In an embodiment of step 245, the datacenter virtual storage array interface receives a request for theuncached identified storage blocks from the storage block accessoptimizer and, in response, accesses the physical data storage array toretrieve these storage blocks. The data center virtual storage arrayinterface then forwards these storage blocks to the branch locationvirtual storage array interface via the WAN connection.

Step 250 stores the storage blocks identified for prefetching in thestorage block read cache. In an embodiment of step 250, the branchlocation virtual storage array interface receives one or more storageblocks from the data center virtual storage array interface via the WANconnection and stores these storage blocks in the storage block readcache. Following step 250, method 200 proceeds to step 205 to awaitreceipt of further storage block requests. The storage blocks added tothe storage block read cache in previous iterations of method 200 may beavailable for fulfilling storage block read requests.

Method 200 may be performed by a branch virtual data storage arrayinterface, by a data center virtual data storage array interface, or byboth virtual data storage array interfaces working in concert. Forexample, steps 205 to 220 of method 200 may be performed by a branchlocation virtual storage array interface and steps 225 to 250 of method200 may be performed by a data center virtual storage array interface.In another example, all of the steps of method 200 may be performed by abranch location virtual storage array interface.

FIG. 2B illustrates a method 255 of performing policy-based prefetchingof storage blocks according to an embodiment of the invention. Step 260selects a high-level data structure entity for analysis. Examples of aselected high-level data structure entities include a file, directory,and other file system entity such as an inode, as well as databaseentities such as tables, records, and B-tree nodes or other structures.

Step 265 analyzes the selected high-level data structure entity toidentify additional portions of the same high-level data structureentity or all or portions of additional high-level data structureentities that are likely to be requested by the storage client. Thereare a number of different techniques for identifying addition high-leveldata structures or portions thereof for prefetching that may be used byembodiments of step 265. Some of these are described in detail inco-pending U.S. patent application Ser. No. ______ [Attorney DocketNumber R001420US], entitled “Virtual Data Storage System Optimizations”,filed ______, which is incorporated by reference herein for allpurposes.

One example technique is to identify the type of entity, such as a fileof a specific format, a directory in a file system, or a database table,and apply one or more heuristics to identify additional portions of thishigh-level data structure entity or a related high-level data structureentity for prefetching. For example, applications employing a specifictype of file may frequently access data at a specific location withinthese files, such as at the beginning or end of the file. Usingknowledge of this application or entity-specific behavior, step 265 mayidentify the beginning or end portions of these types of files forprefetching.

In another example technique, step 265 analyzes the high-level datastructure entity associated with the requested storage block to identifyrelated portions of the same or other high-level data structure entityfor prefetching. For example, application files may include referencesto additional files, such as overlay files or dynamically loadedlibraries. Similarly, a database table may include references to otherdatabase tables. Step 265 may use an analysis of this high-level datastructure entity to identify additional referenced high-level datastructure entities. The referenced high-level data structure entitiesmay be prefetched.

In still another example technique, step 265 may analyze application,virtual machine, or operating system specific files or other high-leveldata structure entities to identify additional high-level data structureentities for prefetching. For example, step 265 may analyze applicationor operating system log files to identify the sequence of files accessedduring operations such a system or application start-up. Theseidentified files may then be selected for prefetching.

Once step 265 has identified one or more high-level data structureentities or portions thereof for prefetching, step 270 identifies all orportions of one or more high-level data structure entities forprefetching based on the high-level data structure entity associatedwith the requested storage block. However, as discussed above, storageclients specify data access requests in terms of storage blocks, nothigh-level data structure entities such as files, directories, ordatabase tables. In an embodiment, step 270 provides the ISSD withidentifiers for one or more high-level data structure entities, such asthe inodes of files or similar identifiers for other types of filesystems or database storage structures. Optionally, step 270 alsoprovides an offset, file location, or other type of address identify aspecific portion of a high-level data structure entity to be prefetched.In response, the ISSD returns an identifier of one or more storageblocks associated with the high-level data structure entities. Thesestorage blocks are used to store the high-level data structure entitiesor portions thereof.

Decision block 275 determines if the storage blocks identified in step270 have already been stored in the storage block read cache located atthe branch location. In an embodiment, the storage block accessoptimizer at the data center maintains a record of all of the storageblocks that have copies stored in the storage block read cache. In analternate embodiment, the storage block access optimizer queries thebranch location virtual storage array interface to determine if copiesof these identified storage blocks have already been stored in thestorage block read cache.

In still a further embodiment, decision block 275 and the determinationof whether an additional storage block has been previously retrieved andcached may be omitted. Instead, this embodiment can send all of theadditional storage blocks identified by step 270 to the branch locationvirtual storage array interface to be cached. This embodiment can beused when WAN latency, rather than WAN bandwidth limitations, are anoverriding concern.

If all of the identified storage blocks from step 270 are already storedin the storage block read cache, then method 255 proceeds from decisionblock 275 to step 280. Optional step 280 determines if there areadditional high-level data structure entities that should be included inthe analysis of method 255, based on the results of step 265. Forexample, if steps 260 and 265 analyze a first file and identify a secondfile that should be prefetched, step 285 may include this second file ina list of high-level data structure entities to be analyzed by method255, potentially identifying additional files from the analysis of thissecond file.

If some or all of the storage blocks identified in step 270 are notalready stored in the storage block read cache, then step 285 retrievesthese uncached storage blocks from the virtual storage array datalocated in a physical data storage on the data center LAN. The retrievedstorage blocks are sent via the WAN connection from the data centerlocation to the branch location. In an embodiment of step 280, the datacenter virtual storage array interface receives a request for theuncached identified storage blocks from the storage block accessoptimizer and accesses the physical data storage array to retrieve thesestorage blocks. The data center virtual storage array interface thenforwards these storage blocks to the branch location virtual storagearray interface via the WAN connection.

Step 290 stores the storage blocks identified for prefetching in thestorage block read cache. In an embodiment of step 290, the branchlocation virtual storage array interface receives one or more storageblocks from the data center virtual storage array interface via the WANconnection and stores these storage blocks in the storage block readcache. Following step 290, method 255 proceeds to step 285. The storageblocks added to the storage block read cache in previous iterations ofmethod 255 may be available for fulfilling storage block read requests.

Following step 280 or, if step 280 is omitted, decision block 275 orstep 290, an embodiment of method 255 proceeds to step 260 to selectanother high-level data structure entity for analysis.

In an embodiment, steps 285 and 290 may be performed asynchronously orin parallel with further iterations of method 255. For example, astorage block access optimizer may direct the data center virtualstorage array interface to retrieve one or more storage blocks. Whilethis operation is being performed, the storage block access optimizermay continue with the execution of method 255 by proceeding to optionalstep 280 to identify further high-level data structure entities foranalysis, and/or returning to step 260 for an additional iteration ofmethod 255. When the data center virtual storage array interface hascompleted its retrieval of one or more storage blocks as requested, step290 may be performed in the background and in parallel to transfer thesestorage blocks via the WAN to the branch location for storage in thestorage block read cache.

Method 255 may be performed by a branch virtual data storage arrayinterface, by a data center virtual data storage array interface, or byboth virtual data storage array interfaces working in concert. Forexample, steps 260 to 285 of method 255 may be performed by a datacenter virtual storage array interface. In another example, all of thesteps of method 255 may be performed by a branch location virtualstorage array interface.

Embodiments of both methods 200 and 255 utilize the ISSD to identifyhigh-level data structure entities from storage blocks and/or toidentify storage blocks from their associated high-level data structureentities. An embodiment of the invention creates the ISSD by initiallysearching high-level data structure entities, such as a master filetable, allocation table or tree, or other types of file system metadatastructures, to identify the high-level data structure entitiescorresponding with the storage blocks. An embodiment of the inventionmay further recursively analyze other high-level data structureentities, such as inodes, directory structures, files, and databasetables and nodes, that are referenced by the master file table or otherhigh-level data structures. This initial analysis may be performed byeither the branch location or data center virtual storage arrayinterface as a preprocessing activity or in the background whileprocessing storage client requests. In an embodiment, the ISSD may beupdated frequently or infrequently, depending upon the desiredprefetching performance. Embodiments of the invention may update theISSD by periodically scanning the high-level data structure entities orby monitoring storage client activity for changes or additions to thevirtual storage array, which is then used to update the affectedportions of the ISSD.

As described above, embodiments of the invention prefetch storage blocksfrom the data center storage array and cache these storage blocks in astorage block cache located at the branch location. In some embodiments,the storage block cache may be smaller than the virtual storage array.Thus, when the storage block cache is full, the branch or data centervirtual storage array interface may need to occasionally evict or removesome storage blocks from the storage block cache to make room for otherprefetched storage blocks. In an embodiment, the branch virtual storagearray interface may use any cache replacement scheme or policy known inthe art, such as a least recently used (LRU) cache management policy.

In another embodiment, the storage block cache replacement policy of thestorage block cache is based on an understanding of the relationshipbetween storage blocks and corresponding high-level data structureentities, such as file system or database entities. In this embodiment,even though the storage block cache operates on the basis of storageblocks, the storage block cache replacement policies determine whetherto retain or evict storage blocks in the storage block cache based ontheir associations to files or other high level data structure entities.

For example, when a virtual storage array interface needs to evictstorage blocks from the storage block cache to create free space forother prefetched storage blocks, an embodiment of the virtual storageinterface uses information associating storage blocks with correspondingfiles to evict all of the storage blocks associated with a single file,rather than evicting some storage blocks from one file and some fromanother file. In this example, storage blocks are not necessarilyevicted based on their own usage alone, but on the overall usage oftheir associated file or other high-level data structure entity.

As another example, the storage block cache may elect to preferentiallyretain storage blocks including file system metadata and/or directorystructures over other storage blocks that include file data only.

In yet another example, the storage block cache may identify files orother high-level data structure entities that have not been accessedrecently, and then use the ISSD to identify and select the storageblocks corresponding with these infrequently used files for eviction.

Although these examples of storage block cache replacement policies arediscussed with reference to file and file systems, similar techniquescan be applied to databases and other types of high-level data structureentities.

In addition to selectively evict storage blocks based on theirassociated high-level data structure entities, an embodiment of thevirtual array storage system can also include cache policies topreferentially retain or “pin” specific storage blocks in the storageblock cache, regardless of their usage or other factors. These cacheretention policies can ensure that specific storage blocks are alwaysaccessible at the branch location, even at times when the WAN isunavailable, since copies of these storage blocks will always exist inthe storage block cache.

In this embodiment, a user, administrator, or administrative applicationmay specify all or a portion of the virtual storage array forpreferential retention or pinning in the storage block cache. Uponreceiving a request to pin some or all of the virtual storage array datain the storage block cache, the virtual storage array system needs todetermine if the storage block cache has sufficient additional capacityto store the specified storage blocks. If the storage block cache hassufficient capacity, the virtual storage array system is allowed toreserves space in the storage block cache for the specified storageblocks; otherwise this request is denied.

If the storage block cache has sufficient capacity to satisfy thepinning request, the cache also may initiate a proactive prefetchprocess to retrieve any requested storage blocks that are not already inthe storage block cache from the data center via the WAN. For largepinning requests, such as an entire virtual storage array, it may takehours or days for this proactive prefetch to be completed. In a furtherembodiment, this proactive prefetching of pinned storage blocks may beperformed asynchronously and at a lower priority than storage clients'requests for virtual storage array read operations, associatedprefetching (discussed above), and the virtual storage array writeoperations (discussed below). This embodiment may be used to deploy datato a new branch location. For example, upon activation of the branchstorage array interface, the virtual storage array data is copiedasynchronously via the WAN to the branch location storage block cache.Although this data transfer may take some time to complete, storageclients at this new branch location can access virtual storage arraydata immediately using the virtual storage array read and writeoperations, with the above-described storage block prefetching hidingthe bandwidth and latency limitations of the WAN when storage clientsaccess storage blocks that have yet to be copied to the branch location.

In another embodiment, the storage block cache may allow users,administrators, and administration applications the ability to directlyspecify the pinning of high-level data structure entities, such as filesor database elements, as opposed to specifying storage blocks forpinning in the storage block cache. In this embodiment, the virtualstorage array uses the ISSD to identify storage blocks correspondingwith the specified high-level data structure entities. In a furtherembodiment, the virtual storage array may allow user, administrators,and administrative applications to specify only a portion of high-leveldata structure entities for pinning, such as file metadata andfrequently used indices within high-level data structure entities. Thevirtual storage array then uses the associations between storage blocksand high-level data structure entities from the ISSD to identifyspecific storage blocks to be pinned in the storage block cache.

Similarly, the virtual storage array cache can be used to hide latencyand bandwidth limitations of the WAN during virtual storage arraywrites. FIG. 3 illustrates a method 300 of processing storage blockwrite requests to improve virtualized data storage system performanceaccording to an embodiment of the invention.

An embodiment of method 300 starts with step 305 receiving a storageblock write request from a storage client within the branch locationLAN. The storage block write request may be received from a storageclient by a branch location virtual storage interface.

In response to the receipt of the storage block write request, decisionblock 310 determines if the storage block write cache in the virtualstorage array cache at the branch location is capable of acceptingadditional write requests or is full. In an embodiment, the virtualstorage array cache may use some or all of its storage as a storageblock write cache for pending virtual storage array write operations.

If the storage block write cache in the virtual storage array cache canaccept an additional storage block write request, then step 315 storesthe storage block write request, including the storage block data to bewritten, in the storage block write cache. Step 320 then sends a writeacknowledgement to the storage client. Following the storage client'sreceipt of this write request, the storage client believes its storageblock write request is complete and can continue to operation normally.Step 325 then transfers the queued written storage block via the WAN tothe physical storage array at the data center LAN. This transfer mayoccur in the background and asynchronously with the operation of storageclients.

While a storage block write request is queued in the storage block writecache and waiting to be transferred to the data center, a storage clientmay wish to access this storage block for a read or an additional write.In this situation, the virtual storage array interface intercepts thestorage block access request. In the case of a storage block read, thevirtual storage array interface provides the storage client with thepreviously queued storage block. In the case of a storage block write,the virtual storage array interface will update the queued storage blockdata and send a write acknowledgement to the storage client for thisadditional storage block access.

Conversely, if decision block 310 determines that the storage block readcache cannot accept an additional storage block write request, then step330 immediately transfers the storage block via the WAN to the physicalstorage array at the data center LAN. In an embodiment of step 335, thebranch location virtual storage array interface receives a writeconfirmation that the storage block write operation is complete. Thisconfirmation may be received from a data center virtual storage arrayinterface or directly from a physical storage array or other datastorage device. Following completion of this transfer, step 340 sends awrite acknowledgement to the storage client, allowing the storage clientto resume normal operation.

In a further embodiment, a branch location virtual storage arrayinterface may throttle storage block read and/or write requests fromstorage clients to prevent the virtual storage array cache from fillingup under typical usage scenarios.

To prevent data loss or corruption in the face of unexpected events suchas power failures, typical file systems and databases issue data writesto block storage devices in a specific order and with certaindependencies to maintain internal consistency of structures and ensurethe desired semantics for modifications. For example, most transactionaldatabases employ write ahead logging techniques when modifying indexstructures, so that in case of failure, any operations that are loggedbut not completed can be replayed upon restart.

Embodiments of the virtual storage array use write order preservation tomaintain data consistency. In these embodiments, the storage block cachetracks the order in which write requests are received and can use thisordering information when forwarding the storage block write requests tothe physical storage array via the WAN, as described by step 325.

FIGS. 4A-4C illustrate three write order preservation policies accordingto an embodiment of the invention. FIG. 4A illustrates the contents ofan example storage block write WAN queue 400. Storage block write WANqueue 400 is used by embodiments of a branch virtual storage arrayinterface to schedule the transmission of storage blocks written bystorage clients at the branch location from the storage block writecache to the physical storage array at the data center location. In theexample storage block write WAN queue 400, a sequence of ten writeoperations from one or more branch storage clients is recorded. For eachwrite operation in this example sequence, the storage block write WANqueue 400 includes a reference to the storage block written by thiswrite operation. For example, the first or earliest write operationreceived, write operation 1, is a write to storage block 4 and the lastwrite or most recent write operation received, write operation 10, is awrite to storage block 5.

In an embodiment of the invention, a first write order preservationpolicy is to preserve the semantics of the original file system,database, or other high-level data structure entity by forwarding allblock write requests over the WAN to the physical storage array in thesame order that they were received by the virtual array storage cache.Thus, the branch virtual storage array interface will communicatewritten storage blocks to the physical storage array at the data centervia the WAN in the same sequence as shown in example storage block writeWAN queue 400.

When using this policy, the image of the file system or database thatexists on the physical storage array is always an internally consistentreplica of the modifications made by storage clients at some point intime. Additionally, snapshots of the virtual storage array data, such assnapshots A and B, are guaranteed to be internally consistent, becausethey include all of the write operations prior to the snapshot time.However, if the same storage blocks are written to multiple times priorto their transfer to the physical storage array, this write orderpreservation policy requires the storage block write cache to keep trackof multiple versions of these storage blocks and forward all of thewrite operations to these different versions of the storage block in theorder received. Moreover, this policy requires more WAN bandwidthbecause every version of a storage block in the storage block write WANqueue must be forwarded to the data center, even if these versions aresuperseded by more recent versions of the storage block already in thestorage block write WAN queue. For example, in storage block write WANqueue 400, storage block 3 is written to in write operations 2, 4, and7. Thus, the storage block write cache must transmit all three of theseversions of storage block 3 in the order that they were received.

In another embodiment of the invention, a second write orderpreservation policy forwards only the most recently written version ofeach storage block in the storage block write cache. FIG. 4B illustratesan example storage block WAN transmission order 405 according to thisembodiment of the invention. Example storage block WAN transmissionorder 405 is based on the example storage block writes WAN queue 400shown in FIG. 4A. In example storage block WAN transmission order 405,only the most recent versions of each storage block in storage blockwrites WAN queue 400 are communicated to the data center via the WAN.For example, write operation 5 in storage block writes WAN queue 400 isthe most recent version of storage block 4. Similarly, write operations7, 8, 9, 10 in storage block writes WAN queue 400 are the most recentversion of storage block 3, 1, 2, and 5, respectively. Thus, storageblock operations 5, 7, 8, 9, and 10 are the only write operations instorage block writes WAN queue 400 that need to be transmitted to thephysical storage array at the data center, as shown by example storageblock WAN transmission order 405. The remaining storage block writeoperations in the storage block writes WAN queue 400 may be discarded.

The most recent version policy shown by FIG. 4B reduces the WANbandwidth required, because multiple versions of the same storage blockneed not be transmitted. However, by ignoring the write orderingdependencies of the original sequence of write operations, the virtualstorage array data on the physical storage array may not be internallyconsistent until all of the write operations in the storage block writecache have been processed, if necessary, and transmitted back to thephysical storage device at the data center.

Additionally, this policy does not preserve consistent snapshots of thevirtual storage array, because some write operations prior to a snapshotmay be omitted from the storage block WAN transmission order 405 ifthere are further writes to the same storage block after the snapshottime. For example, write operations 1, 2, and 3 from the storage blockwrites WAN queue 400, which occur before the time of snapshot A, areomitted from the storage block WAN transmission order 405. Thus,snapshot A will not be internally consistent because it is missing themost recent version of storage blocks 4, 3, and 1 prior to the time ofsnapshot A.

In another embodiment of the invention, a third write order preservationpolicy forwards the most recently written versions of storage blocksbefore each snapshot time. FIG. 4C illustrates an example storage blockWAN transmission order 410 according to this embodiment of theinvention. Example storage block WAN transmission order 410 is based onthe example storage block writes WAN queue 400 shown in FIG. 4A. Inexample storage block WAN transmission order 410, the most recentversions of each storage block before each snapshot time in storageblock writes WAN queue 400 are communicated to the data center via theWAN.

For example, storage block writes WAN queue 400 includes two snapshottimes, snapshot A and snapshot B. For each snapshot time, an embodimentof the storage block write cache forwards only the most recent versionof storage blocks updated by write operations prior to this snapshottime. For example, storage block 4 is updated by write operations 1 and3 and storage block 3 is updated by write operation 2 prior to snapshottime A. In this example, the storage block WAN transmission order 410output by the storage block write cache will include write operations 2and 3 to update storage blocks 3 and 4, reflecting the most recentupdates of these storage blocks prior to the snapshot time A. In thisexample, write operation 1 is omitted because the write operation 3 is amore recent update the same storage block before the snapshot time A.

Similarly, the storage block WAN transmission order 410 includes writeoperations 5, 6, and 7, reflecting the most recent updates of storageblocks 4, 2, and 3, respectively, prior to the snapshot time B. In thisexample, the storage block WAN transmission order 410 include multipleversions of the same storage block if there is one or more snapshotsbetween the associated write operations. For example, write operations 3and 5 are both included in storage block WAN transmission order 410because they update storage block 4 prior to and following the snapshottime A.

Additionally, the storage block WAN transmission order 410 includeswrite operations 8, 9, and 10, which are the most recent updates tostorage blocks 1, 2, and 5, respectively, following snapshot time B.

In this embodiment, although the physical storage array may contain aninconsistent view of the virtual storage array data at some arbitrarypoints in time, this embodiment ensures that the virtual storage arraydata will be internally consistent at the times of snapshots.

As discussed above, the data of a virtual storage array may be stored inphysical storage array or other data storage device. In someapplications, such as with virtual machine applications, the physicalstorage blocks used by the virtual storage array belong to a virtualmachine file system, such as VMFS. In these applications, there may bemany layers of abstraction between virtual storage array storage blocksand the high-level data structure entities used by a virtual machineapplication and its hosted applications. Because of this, embodiments ofthe invention may perform multiple transformations to identifyhigh-level data structure entities corresponding with given virtualstorage array storage blocks and, once these high-level data structureentities are identified, may perform multiple optimizations to attemptto predict and prefetch virtual storage array storage blocks that willbe requested by a storage client in the near future.

FIG. 5 illustrates an example arrangement 500 for successively applyingtransformations and optimizations to improve virtualized data storagesystem performance according to an embodiment of the invention. Inexample 500, successive levels of translation may be used to convertstorage block requests to corresponding intermediate level datastructure entities and then into corresponding high-level data structureentities. Example arrangement 500 includes a physical data storagesystem 505, such as a physical data storage array or file server. Thephysical data storage system 505 may be associated with a file system orvolume manager that provides an interface for accessing physical storageblocks. In this example arrangement 500, a virtual storage arrayinterface receives a request for a virtual storage array storage blockfrom a storage client. This request for a virtual storage array storageblock is converted by one or more virtual storage array interfaces to arequest 507 for a corresponding physical storage block in the physicaldata storage system 505.

To identify additional physical storage blocks for prefetching, examplearrangement 500 includes a physical storage block to virtual machinestorage structure translation module 510. Module 510 maps a givenphysical storage block to a corresponding portion of a virtual machinestorage structure 515. For example, virtual machine storage structure515 may be a VMFS storage volume. The VMFS storage volume appears as alogical storage unit, such as a LUN, to the virtual storage arrayinterface. In this example, the VMFS storage volume may include multiplevirtual machine disk images. Although the VMFS storage volume appears asa single logical storage unit to the storage client, each disk imagewithin the VMFS storage volume appears to a virtual machine applicationas a separate virtual logical storage unit. In this example, module 510may identify a portion of a virtual logical storage unit within the VMFSstorage volume as corresponding with the requested physical storageblock.

Module 520 maps the identified portion of a virtual machine storagestructure, such as a virtual logical storage unit within a VMFS storagevolume, to one or more corresponding virtual file system storage blockswithin a virtual file system 525. Virtual file system 525 may be anytype of file system implemented within a virtual logical storage unit.Examples of virtual file systems include FAT, NTFS, and the ext familyof file systems. For example, a virtual logical storage unit may be adisk image used by a virtual machine application. The disk imagerepresents as data as virtual storage blocks of a virtual data storagedevice. The virtual storage blocks in this disk image are organizedaccording to the virtual file system 525.

As with physical storage blocks and physical file systems, virtualmachine applications and their hosted applications typically access datain terms of files in the virtual file system 525, rather than storageblocks. Moreover, high-level data structure entities within the virtualfile system, such as files or directories, may be spread out overmultiple non-contiguous virtual storage blocks in the virtual filesystem 525. Thus, a virtual file system inferred storage structuredatabase 530 and virtual file system block access optimizer 532 leveragean understanding of the semantics and structure of the high-level datastructures associated with the virtual storage blocks to predict whichvirtual storage blocks are likely to be requested by a storage client inthe near future. The virtual file system ISSD 530 and virtual filesystem block access optimizer 532 are similar to the ISSD and blockaccess optimizer, respectively, for physical data storage discussedabove.

In arrangement 500, the virtual file system block access optimizer 532receives an identification of one or more virtual storage blocks in thevirtual file system 525 that correspond with the requested physicalstorage block in request 507. The virtual file system block accessoptimizer 532 uses the virtual file system ISSD 530 to identify one ormore virtual file system high-level data structure entities, such asvirtual file system files, corresponding with these virtual file systemstorage blocks. The virtual file system block access optimizer 532 usesits knowledge of the high-level data structure entities and reactiveand/or policy-based prefetching techniques to identify one or moreadditional high-level data structure entities or portions thereof forprefetching. The virtual file system block access optimizer 532 thenuses the virtual file system ISSD 530 to identify additional virtualstorage blocks in the virtual file system 525 corresponding with theseadditional high-level data structure entities or portions thereof. Theadditional virtual storage blocks in the virtual file system 525 areselected for prefetching.

Once the virtual file system block access optimizer 532 has selected oneor more virtual file system storage blocks for prefetching, a request533 for these virtual file system storage blocks is generated. In anembodiment of arrangement 500, module 520 translates the prefetchrequest 533 for virtual file system storage blocks into an equivalentprefetch request 535 for a portion of the virtual machine storagestructure. Then, module 510 translates the prefetch request 525 for aportion of the virtual machine storage structure into an equivalentprefetch request 537 for physical storage blocks in the physical datastorage system 505. The physical storage blocks indicated by request 537correspond with the virtual file system storage blocks from request 533.These requested physical storage blocks may be retrieved from thephysical data storage system 505 and communicated via the WAN to abranch location virtual storage array interface for storage in a storageblock read cache.

Arrangement 500 is one example for successively applying transformationsand optimizations to improve virtualized data storage system performanceaccording to an embodiment of the invention. Further embodiments of theinvention may apply any number of successive transformations to physicalstorage blocks to identify associated high-level data structureentities. Additionally, once one or more associated high-level datastructure entities have been identified, embodiments of the inventionmay apply optimizations at the level of high-level data structureentities or at any lower level of abstraction. For example,optimizations may be performed at the level of virtual machine filesystem files, virtual machine file system storage blocks, virtualmachine storage structures, physical storage blocks, and/or at any otherintermediate data structure level of abstraction.

FIG. 6 illustrates a method 600 of creating a data storage snapshot in avirtualized data storage system performance according to an embodimentof the invention. Method 300 begins with step 605 initiating of avirtual storage array checkpoint. A virtual storage array checkpoint maybe initiated automatically by a branch location virtual storage arrayinterface according to a schedule or based on criteria, such as theamount of data changed since the last checkpoint. In a furtherembodiment, a virtual storage array checkpoint may be initiated inresponse to a request for a virtual storage array snapshot from a systemadministrator or administration application.

To create a virtual storage array checkpoint, step 610 sets the branchlocation virtual storage array interface to a quiescent state. Thisentails completing any pending operations with storage clients (thoughnot necessarily background operations between the branch location anddata center virtual storage array interfaces, such as transferring newor updated storage blocks from the storage block write cache to the datacenter via the WAN). While in the quiescent state, the branch locationvirtual storage interface will not accept any new storage operationsfrom storage clients.

Once the branch location virtual storage array interface is set to aquiescent state, step 615 identifies new or updated storage blocks inthe branch location virtual storage array cache. These new or updatedstorage blocks include data that has been created or updated by storageclients but have yet to be transferred via the WAN back to the datacenter LAN for storage in the physical data storage array.

Once all of the updated storage blocks have been identified, step 615creates a checkpoint data structure. The checkpoint data structurespecifies a time of checkpoint creation and the set of new and updatedstorage blocks at that moment of time. Following the creation of thecheckpoint data structure, step 620 reactivates the branch location'svirtual storage array. The branch location virtual storage arrayinterface can resume servicing storage operations from storage clients.Additionally, the branch location virtual storage array interface mayresume transferring new or updated storage blocks via the WAN to thedata center LAN for storage in the physical data storage array. In afurther embodiment, the virtual storage array cache may maintain a copyof an updated storage block even after a copy is transferred back to thedata center LAN for storage. This allows subsequent snapshots to becreated based on this data.

In an embodiment, following the reactivation of the virtual storagearray, the branch location virtual storage array interface preserves theupdated storage blocks specified by the checkpoint data structure fromfurther changes. If a storage client attempts to update a storage blockthat is associated with a checkpoint, an embodiment of the branchlocation virtual storage array interface creates a duplicate of thisstorage block in the virtual storage array cache to store the updateddata. By making a copy of this storage block, rather than replacing itwith further updated data, this embodiment preserves the data of thisstorage block at the time of the checkpoint for potential futurereference.

Optionally, an embodiment of method 600 may initiate one or moreadditional virtual storage array checkpoints at later times or inresponse to criteria or conditions. Embodiments of the branch locationvirtual storage array interface may maintain any arbitrary number ofcheckpoint data structures and automatically delete outdated checkpointdata structures. For example, a branch location virtual storageinterface may maintain only the most recently created checkpoint datastructure, or checkpoint data structures from the beginning of the mostrecent business day and the most recent hour.

At some point, a system administrator or administration application mayrequest a snapshot of the virtual storage array data. A snapshot of thevirtual storage array data represents the complete set of virtualstorage array data at a specific moment of time. Step 625 receives asnapshot request. In response to a snapshot request, step 630 transfersa copy of the appropriate checkpoint data structure from the branchlocation virtual storage array interface to the data center virtualstorage interface. Additionally, step 630 transfers a copy of anyupdated storage blocks specified by this checkpoint data structure fromthe branch location virtual storage array interface to the data centervirtual storage array interface for storage in the physical storagearray.

In an embodiment of step 630, the data center virtual storage arrayinterface creates a snapshot of the data of the virtual storage array.The snapshot includes a copy of all of the virtual storage array data inthe physical data storage array unchanged from the time of creation ofthe checkpoint data structure. The snapshot also includes a copy of theupdated storage blocks specified by the checkpoint data structure. Anembodiment of the data center virtual storage array interface may storethe snapshot in the physical storage array or using a data backup. In anembodiment, the data center virtual storage array interfaceautomatically sends storage operations to the physical storage arrayinterface to create a snapshot from a checkpoint data structure. Thesestorage operations can be carried out in the background by the datacenter virtual storage array interface in addition to translatingvirtual storage array operations from one or more branch locationvirtual storage array interfaces into corresponding physical storagearray operations.

Embodiments of the invention can implement virtual storage arrayinterfaces at the branch and/or data center as standalone devices or aspart of other devices, computer systems, or applications. FIG. 7illustrates an example computer system capable of implementing a virtualstorage array interface according to an embodiment of the invention.FIG. 7 is a block diagram of a computer system 2000, such as a personalcomputer or other digital device, suitable for practicing an embodimentof the invention. Embodiments of computer system 2000 may includededicated networking devices, such as wireless access points, networkswitches, hubs, routers, hardware firewalls, network traffic optimizersand accelerators, network attached storage devices, storage arraynetwork interfaces, and combinations thereof.

Computer system 2000 includes a central processing unit (CPU) 2005 forrunning software applications and optionally an operating system. CPU2005 may be comprised of one or more processing cores. In a furtherembodiment, CPU 2005 may execute virtual machine software applicationsto create one or more virtual processors capable of executing additionalsoftware applications and optional additional operating systems. Virtualmachine applications can include interpreters, recompilers, andjust-in-time compilers to assist in executing software applicationswithin virtual machines. Additionally, one or more CPUs 2005 orassociated processing cores can include virtualization specifichardware, such as additional register sets, memory address manipulationhardware, additional virtualization-specific processor instructions, andvirtual machine state maintenance and migration hardware.

Memory 2010 stores applications and data for use by the CPU 2005.Examples of memory 2010 include dynamic and static random access memory.Storage 2015 provides non-volatile storage for applications and data andmay include fixed or removable hard disk drives, flash memory devices,ROM memory, and CD-ROM, DVD-ROM, Blu-ray, or other magnetic, optical, orsolid state storage devices. In an embodiment, storage 2015 includesmultiple storage devices configured to act as a storage array forimproved performance and/or reliability. In a further embodiment,storage 2015 includes a storage array network utilizing a storage arraynetwork interface and storage array network protocols to store andretrieve data. Examples of storage array network interfaces suitable foruse with embodiments of the invention include Ethernet, Fibre Channel,IP, and InfiniBand interfaces. Examples of storage array networkprotocols include ATA, Fibre Channel Protocol, and SCSI. Variouscombinations of storage array network interfaces and protocols aresuitable for use with embodiments of the invention, including iSCSI,HyperSCSI, Fibre Channel over Ethernet, and iFCP.

Optional user input devices 2020 communicate user inputs from one ormore users to the computer system 2000, examples of which may includekeyboards, mice, joysticks, digitizer tablets, touch pads, touchscreens, still or video cameras, and/or microphones. In an embodiment,user input devices may be omitted and computer system 2000 may present auser interface to a user over a network, for example using a web page ornetwork management protocol and network management softwareapplications.

Computer system 2000 includes one or more network interfaces 2025 thatallow computer system 2000 to communicate with other computer systemsvia an electronic communications network, and may include wired orwireless communication over local area networks and wide area networkssuch as the Internet. Computer system 2000 may support a variety ofnetworking protocols at one or more levels of abstraction. For example,computer system may support networking protocols at one or more layersof the seven layer OSI network model. An embodiment of network interface2025 includes one or more wireless network interfaces adapted tocommunicate with wireless clients and with other wireless networkingdevices using radio waves, for example using the 802.11 family ofprotocols, such as 802.11a, 802.11b, 802.11g, and 802.11n.

An embodiment of the computer system 2000 may also include a wirednetworking interface, such as one or more Ethernet connections tocommunicate with other networking devices via local or wide-areanetworks.

The components of computer system 2000, including CPU 2005, memory 2010,data storage 2015, user input devices 2020, and network interface 2025are connected via one or more data buses 2060. Additionally, some or allof the components of computer system 2000, including CPU 2005, memory2010, data storage 2015, user input devices 2020, and network interface2025 may be integrated together into one or more integrated circuits orintegrated circuit packages. Furthermore, some or all of the componentsof computer system 2000 may be implemented as application specificintegrated circuits (ASICS) and/or programmable logic.

Further embodiments can be envisioned to one of ordinary skill in theart after reading the attached documents. For example, embodiments ofthe invention can be used with any number of network connections and maybe added to any type of network device, client or server computer, orother computing device in addition to the computer illustrated above. Inother embodiments, combinations or sub-combinations of the abovedisclosed invention can be advantageously made. The block diagrams ofthe architecture and flow charts are grouped for ease of understanding.However it should be understood that combinations of blocks, additionsof new blocks, re-arrangement of blocks, and the like are contemplatedin alternative embodiments of the present invention.

The specification and drawings are, accordingly, to be regarded in anillustrative rather than a restrictive sense. It will, however, beevident that various modifications and changes may be made thereuntowithout departing from the broader spirit and scope of the invention asset forth in the claims.

1. A method of optimizing a block storage protocol access to a blockstorage device via a wide area network, the method comprising: receivinga first storage block for storage in a storage block cache at a firstnetwork location; determining if the storage block cache has sufficientcapacity to store the first storage block; and in response to thedetermination that the storage block cache does not have sufficientcapacity for the first storage block: selecting a first high-level datastructure entity; identifying at least a second storage block associatedwith the high-level data structure entity and stored in the storageblock cache; and removing at least the second storage block from thestorage block cache.
 2. The method of claim 1, wherein selecting a firsthigh-level data structure entity comprises: selecting the firsthigh-level data structure entity based on its infrequent access by astorage client.
 3. The method of claim 1, wherein selecting a firsthigh-level data structure entity comprises: selecting a third storageblock stored in the storage block cache; and selecting the firsthigh-level data structure entity based on its correspondence with thethird storage block.
 4. The method of claim 3, wherein the third storageblock is selected based on its infrequent access by a storage client. 5.The method of claim 4, comprising: removing the third storage block fromthe storage block cache.
 6. The method of claim 1, wherein the secondstorage block is identified as an infrequently accessed portion of thefirst high-level data structure entity.
 7. The method of claim 7,wherein the second storage block does not include metadata of the firsthigh-level data structure entity.
 8. The method of claim 1, wherein thefirst storage block is received from a storage client in associationwith a storage write operation.
 9. The method of claim 1, wherein thefirst storage block is received from a data storage is association witha storage block prefetching operation.
 10. The method of claim 9,wherein the data storage is connected with a wide-area network at afirst network location and the storage block cache is connected with thewide-area network location at a second network location.
 11. A method ofoptimizing a block storage protocol access to a block storage device viaa wide area network, the method comprising: receiving a storage blockcache replacement policy; selecting at least a portion of a firsthigh-level data structure entity identified by the storage block cachereplacement policy; identifying at least a first storage blockassociated with the selected portion of the first high-level datastructure entity; and selecting the first storage block for retention ina storage block cache connected with a wide-area network at a firstnetwork location.
 12. The method of claim 11, wherein a copy of thefirst storage block is stored in the storage block cache.
 13. The methodof claim 11, wherein a copy of the first storage block is not stored inthe storage block cache, the method comprising: retrieving the firststorage block from a data storage connected with the wide area networkat a second network location.
 14. The method of claim 11, wherein theselected portion of the first high-level data structure entity isidentified as a frequently accessed portion of the first high-level datastructure entity by the storage block cache replacement policy.
 15. Themethod of claim 14, wherein the second storage block includes metadataof the first high-level data structure entity.
 16. A method ofoptimizing a block storage protocol write access to a block storagedevice via a wide area network, the method comprising: receiving asequence of storage block write operations; selecting a first storageblock write operation included in the sequence of storage block writeoperations, wherein the first storage block operation includes a firstversion of a storage block; determining if the sequence of storage blockwrite operations includes a second storage block write operationincluding a second version of the storage block, wherein the secondstorage block write operation is more recent than the first storageblock write operation; in response to the determination that thesequence of storage block write operations does not include the secondstorage block write operation including the second version of thestorage block, communicating the first version of the storage block viaa wide area network to a data storage connected with the wide areanetwork at a first network location; and in response to thedetermination that the sequence of storage block write operationsincludes the second storage block write operation including the secondversion of the storage block, communicating the second version of thestorage block via the wide area network to the data storage.
 17. Themethod of claim 16, comprising: in response to receiving the sequence ofstorage block requests, caching the sequence of storage block requestsin a storage block cache; and following the communication of the secondversion of the storage block to the data storage, removing the firststorage block write operation and the first version of the storage blockfrom the storage block cache.
 18. The method of claim 17, wherein thestorage block cache is connected with the wide-area network at a secondnetwork location, the method comprising: following the communication ofthe second version of the storage block to the data storage, retainingthe first version of the storage block in the storage block cache forread access by a storage client connected with the wide-area network atthe second location.
 19. The method of claim 16, wherein determining ifthe sequence of storage block write operations includes the secondstorage block write operation including the second version of thestorage block comprises: searching the sequence of storage block writeoperations from a time associated with the first storage block writeoperation up to a snapshot time.
 20. The method of claim 16, whereindetermining if the sequence of storage block write operations includesthe second storage block write operation including the second version ofthe storage block comprises: searching the sequence of storage blockwrite operations from a time associated with the first storage blockwrite operation up to an end of the sequence of storage block writeoperations.